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The preparation of the stationary distribution of irreducible, time-reversible Markov chains is a 
fundamental building block in many heuristic approaches to algorithmically hard problems. It has 
been conjectured that quantum analogs of classical mixing processes may offer a generic quadratic 
speed-up in realizing such stationary distributions. Such a speed-up would also imply a speed-up of 
a broad family of heuristic algorithms. However, a true quadratic speed up has thus far only been 
demonstrated for special classes of Markov chains. These results often presuppose a regular structure 
of the underlying graph of the Markov chain, and also a regularity in the transition probabilities. 
In this work, we demonstrate a true quadratic speed-up for a class of Markov chains where the 
restriction is only on the form of the stationary distribution, rather than directly on the Markov 
chain structure itself. In particular, we show efficient mixing can be achieved when it is beforehand 
known that the distribution is monotonically decreasing relative to a known order on the state space. 
Following this, we show that our approach extends to a wider class of distributions, where only a 
fraction of the shape of the distribution is known to be monotonic. Our approach is built on the 
Szegedy-type quantization of transition operators. 


I. INTRODUCTION 

Quantum walks have, amongst other reasons, been long investigated for their capacity to speed up mixing 
processes - that is, speeding up the task of preparing stationary distributions of a given Markov chain (MC). 
Efficient mixing is a much coveted property in the context Markov Chain Monte Carlo (MCMC) approaches to 
many algorithmic methods for hard combinatorial problems and problems arising in statistical physics [1]. In 
the case of time-reversible Markov chains it is well-known that the bound on mixing times is tight relative to the 
spectral gap S of the Markov chain - both the lower and the upper bounds on the (approximate) mixing times 
are proportional to 1/6, whereas other quantities (e.g. the allowed error of the approximation) appear only as 
logarithmic factors. Improvements in attaining target distributions then, in the classical case, always stem from 
additional constructions: e.g. by utilizing sequences of slowly evolving MCs in simulated annealing, or by using 
alternative MCs which mix faster toward the same distribution (e.g. graph lifting [2]). Such approaches are not 
proven to help generically, and their utility is established, essentially, on a case-by-case basis. 

However, even without resorting to additional structures it is possible that here quantum mechanics may help 
generically. By employing quantum analogs of transition operators of MCs, speed-ups of mixing times already 
been proven [3, 4] in the cases where the underlying transition graph corresponds to periodic lattices and the 
torus. In these works, the quantum operator employed was Ut = exp{—iDpt), where Dp is the discriminant 
operator of the (time-reversible) transition operator P, which is equal to P itself in the cases when the stationary 
distribution of P is uniform. The exact definition oi Dp will be given later. These results contribute towards a 
working conjecture that quantum transition operators may offer a generic quadratic speed-up in mixing times 

Alternative approaches to quantum mixing, based on the Szegedy quantum transition operator, have been 
already proposed by Richter^, based on observations by Childs [3, 5]. In particular, it has been observed that 
so-called hitting algorithms, which attempt to find a particular element in the state space, by starting from the 
stationary distribution of a MC, and using the transition operator, may be run in reverse to realize a mixing 
algorithm. However, to our knowledge, these approaches were not pursued further due to their inefficiency. In 
such an approach, the mixing time has a prohibitive dependance on the probabilities occurring in the stationary 
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distribution, which lead an additional lower bound dependence of on the state space size N. Nonetheless, 

Szegedy-style walk operators have been successfully employed in other context, mostly relying on decreasing 
so-called hitting times of random walks [6-9]. 

In this work we re-evaluate the approach based on the Szegedy quantum transition operator (as outlined 
by Richter in [3]), and show how the lower bound state-space-size dependence of Q{'/N) can be exponentially 
improved to 0(log^^^(A^)) in the case when additional knowledge is available on the shape of the stationary 
distribution. The outline of the paper is as follows: in section II we give the preliminaries and set up the 
notation. Following this, in section III we explain the main result for a special case of monotonically decaying 
distributions. In section IV we explain when and how the result can be extended to a much wider class of 
distributions, elaborate why similar techniques cannot be useful in classical mixing problems, and also prove 
the optimality of our approach. We hnish off in section V with a brief discussion. 


II. PRELIMINARIES 

We begin by a brief recap of the basic concepts and results in discrete-time, time-homogeneous Markov chain 
theory. A discrete-time, time-homogeneous Markov chain is characterized with a transition matrix (operator) 
P which acts on a state space S oi N states. P can be represented by a left-stochastic matrix (a matrix 
with non-negative, real entries with columns adding to one). P, with an initial distribution, fully specify a 
Markov chain and we will often refer to P as the transition matrix and the Markov chain, interchangeably. If 
P is irreducible (that is, P is an adjacency matrix of a strongly connected graph) and aperiodic (the greatest 
common divisor of the periods of all states is 1), then there exists a unique stationary distribution tt, such that 
Ptt = tt We will represent distributions as a non-negative column vector tt = (tt^))^]^, £ R)}', such that 

TTi = 1. Irreducible and aperiodic MCs mix: a sequential application of P onto any initial state in the limit 
yields the stationary distribution. More precisely, it holds that limt_>oo = tt, for all initial distributions a . 
This property is sometimes referred to as the fundamental theorem of Markov chains. 

In this work we will focus on time-reversible, irreducible and aperiodic Markov chains. A Markov chain P 
with stationary distribution tt is said to be time-reversible if it satisfies detailed balance: 

'^iPij — bj’ (^) 

More generally, for an irreducible, aperiodic Markov chain P, over the state space of size N with stationary 
distribution tt, we dehne the time-reversed Markov chain P* with P* = M{tt)P ’^where M is the 
diagonal matrix M = diagini,... jTTn).^ Then, P is time-reversible if P = P*. The discriminant matrix Dp 
is then defined with Dp = M^/^(7r)P’^M(7r)“^/^, and it can be shown that it is always symmetric for time- 
reversible Markov chains. Since a time-reversible transition matrix P is similar to a symmetric matrix, its 
eigenvalues are real, and also by the Perron-Frobenius theorem they are less or equal to 1 in absolute value 
(value I being reserved for the stationary distribution which is also the -1-1 eigenvector). 

If A 2 denotes the second largest eigenvalue (in absolute value) then with S = 1 — IA 2 I we will denote the 
spectral gap of the Markov chain P. 

Next, the mixing time T(e), within error e, for P is defined as: 

r(e) = min {t|P(P*cr, tt) < e,Vcr} (2) 

where D{tt,<7) denotes the total variational distance on distributions tt , cr, so D{'K,a) = 1/2^^ [ tt ^ — Cj], or, 
more generally, the trace distance on density matrices tt , a: D(7r, a) = l/2Tr[\'7T — o’]]. 

The mixing time (for the MC P, with stationary distribution tt) has a tight bound, in the time-reversible 
case, proven by Aldous in [10], but which we present in a more detailed form derived from [11]: 

^ 2 ^ log < T(e) < i (log -blog i) (3) 

0 Ze 0 ‘^min ^ 

for TTmin = miUi TTi. 


^ In this work the transition matrices are left-stochastic and act on column-vectors from the left. 

^ The inverse of D always exists, as stationary distributions of irreducible aperiodic Markov chains have non-zero support over the 
entire state space. 
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For more details on Markov chains, we refer the reader to [12]. 

Next, we present the basic elements of Szegedy-style approaches to quantum walks. Part of the presentation 
follows the approach given in [7]. 

Szegedy-style quantum walks can be viewed as walks over a bipartite graph, realized by duplicating the graph 
of the original MC, specified by the transition matrix P. The basic building block is a diffusion operator Up 
which acts on two quantum registers of N states and satisfies 

AT-l 

Up |*)j |0)jj = |*)j \/Pji b)ii ■ (4) 

j=0 

It is easy to see that Up establishes a walk step from the first copy of the original graph to the second. The 
operator Up, and its adjoint are then used to construct the following operator: 

refiA) = Upili®Zu)Ul, (5) 

where Z = 2 |0) (0| — 1, reflects over the state |0). The operator ref {A) is a reflector the subspace A = 
span({C/p |i) [0)}^). A second reflector is established by defining a second diffusion operator, realizing a walk 
step from the second copy of the graph back to the first: Vp = SWAPijiUpSWAPiji. From here, we proceed 
analogously as in the case for the set A, to generate the ref{B) operator, reflecting over B = span({I/p |0) |j)}j). 
The Szegedy walk operator is then defined as W{P) = ref{B)ref{A). In [6, 7] it was shown that the operator 
W{P) and P are closely related, in particular in the case P is time-reversible, which we clarify next. 

Given a distribution tt, we will denote the coherent encoding of the distribution tt with [tt) = X]i=i V^l*)- 
For us it is convenient to define a one-step diffused version of the encoding above, specific to a particular Markov 
chain: [tt') = Up |7r)j G |0)jj, where Up is the Szegedy diffusion operator. It is easy to see that [tt) and [tt') are 
trivially related via the diffusion map (more precisely, the isometry [tt) C/p [tt) ® jO)) and moreover that the 
computational measurement of the first register of [tt') also recovers the distribution tt. By abuse of notation, 
we shall refer to both encodings as the coherent encoding of the distribution tt, and denote them both [tt) , 
where the particular encoding will be clear from context. Next, we clarify the relationship between the classical 
transition operator P and the Szegedy walk operator W{P). Let tt be the stationary distribution of P so Ptt = tt. 
Then the coherent encoding of the stationary distribution tt of P, given with [tt) = UpJ2i I*) |0), is also 
a -1-1 eigenstate of W{P), so W[P) [tt) = [tt). Moreover, on the subspace A + B, so-called busy subspace, it 
is the unique -|-1 eigenstate. On the orthogonal complement of the busy subspace, W{P) acts as the identity. 
Moreover, the spectrum of P and W{P) is intimately related, and in particular the spectral gap S is quadratically 
smaller than the phase gap 

A = min{2|6l||e*®Ga(IF(P)),6»^0}, (6) 

where 0 denote the arguments of the eigenvalues, i.e. eigenphases, of W{P). In other words, we have that 
1/A G 0(1/VS). This relationship is at the very basis of all speed up which stems from employing Szegedy-style 
quantum walks. We refer the reader to [6, 7] for further details on Szegedy-style quantum walks. 

A useful central tool in the theory of Szegedy-style quantum walks is the so-called Approximate Reflection 
Operator ARO{P) k, 2 [tt) (tt] — 1, which approximately reflects over the state [tt). The basic idea for the 
construction is as follows: By applying Kitaev’s phase detection algorithm on W{P) (with precision 0(log(A))), 
applying a phase flip to all states with phase different from zero, and by undoing the phase detection algorithm, 
we obtain an arbitrary good approximation of the reflection operator R{P) = 2 [tt) (tt] — 1, for any state within 
A + B. The errors of the approximation can again be efficiently suppressed by iteration (by the same arguments 
as for the [tt) measurement) [7], so the cost of the approximate reflection operator is in 0(1/A) = 0{l/'/S). 

Thus, the second gadget in our toolbox is the operator ARO{P), which approximates a perfect reflection 
R{P) on A + B, while incurring a cost of 0{\/'/5) calls to the walk operator W{P). 

The ARO{P), along with the capacity to flip the phase of a chosen subset of the computational basis elements, 
suffices for the implementation of an amplitude amplihcation [13] algorithm which allows us to find the chosen 
elements. To illustrate this, assume we are given the state (tt) , the (ideal) reflector R{P), and assume we are 
interested in hnding some set of elements M C {!,...,A}. The subset M is typically specified by an oracular 
access to a phase flip operator defined with^M = 1 — 2^-gg |i) (i|. The element searching then reduces to 


^ While there are infinitely many such operators, any one of them will serve our purpose. 
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iterated applications of ZmR{P) (which can be understood as a generalized Grover iteration, more precisely 
amplitude amplification) onto the initial state [tt) . Let tt denote the conditional probability distribution obtained 
by post-selecting on elements being in M from tt, so 


—, if i e M 
e 

0, otherwise, 


( 7 ) 


with e = 1'^) = V^l*) |0) denote the coherent encoding of tt. Note that the measurement 

of the first register of |7f) outputs an element in M with probability 1. Thus successfully preparing this state 
implies that we have found a desired element from M. 

As it was shown in [7], applications of Zm-, and R{P) maintain the register state in the two-dimensional 
subspace span({|7r), Iff)}), and moreover, using 0(l/i/e) applications of the two reflections will suffice to produce 
a state lip) € span{\TT) , Iff)}, such that | (V’l is a large constant (say above 1/4). Measuring the first register 
of such a state will result in an element in M with a constant probability, which means that by iterating this 
process k times ensures an element in M is found with an exponentially increasing probability in k. However, 
since the state |'(/') is also in span{\TT) , |ff)}, it is easy to see that the measured state, conditional on being in 
M, will also be distributed according to ff. This observation was used in [14], and also in [8] to produce an 
element sampled from the truncated stationary distribution ff, in time 0{l/y/e) x Opl/y/S) where the S term 
stems from the cost of generating the approximate reflection operator ARO{P), and 0(1/V6) corresponds to 
the number of iterations which have to be applied. This is a quadratic improvement relative to using classical 
mixing, and position checking processes which would result in the same distribution. 

However, the same process can be used in reverse to generate the state |7r) starting from some fixed basis state 
|f') = Up |f) |0) with cost 0(l/'/5) x (/(I/i/tt}). The resulting state of the reverse process is constantly close 
to the state |7r), which is our target state. This basic idea was already observed in [3] by Richter, however, at 
first glance it seems prohibitive as the resulting mixing time is proportional to 0(1/^^). If no assumptions are 
made on the stationary distribution, this dependency is lower bounded by 0(l/\/]V), as the smallest probability 
in a distribution is upper bounded by 1/iV. 

For this work, we point out that the preparation process above, which starts from an initial basis state is 
trivially generalized: if 1^/) is any initial state, and we have the capacity to reflect over it, we can reach a state 
close to the target state |7r), with an overall cost 0{l/\/5) x 0{1/F{\%p) , Itt))), where F{\tP) , |7r)) = | {tpl 7r)p 
is the standard fidelity. This follows as the search/unsearch algorithms are in fact amplitude amplification [13] 
algorithms. 

Finally we point out that having a constant-distance approximation of the stationary distribution is effectively 
all we need. Given an approximation, which is constantly far from |7r), an arbitrarily good approximation of 
distance e can be achieved in time 0(1/VS x log(l/e)), by again running phase estimation (iteratively) of 
W{P) on the approximate state, and this time measuring the phase-containing register. This |7r)-projective 
measurement is described in more detail in [15], and it follows from Theorem 6 in [7] . 

We have previously used this idea in [15], to achieve more efficient mixings in the context of slowly evolving 
sequences of Markov chains, where the initial states were either basis states, or a state encoding the uniform 
distribution. 

In this work, we will take this idea significantly further, by intrinsic properties of monotonically decaying 
distributions. Let be a finite state space, and let <n be a total order on fl. Then a distribution d is 
monotonically decaying, relative to the order <n, if for its probability mass function fd we have: x,y £ 
{x <n y) => {fd{x) > fd{y))- In this work we will be representing the state space elements with integers, 
and the order will be the standard order so a distribution tt is monotonically decaying if * < j > tTj, 

monotonically increasing if * < j => < tt^. Finally, a distribution tt is strongly unimodal if there exists a 

k £ n such that for i < j < k we have < TTj and for k < i < j we have tt ^ > TTj. 


III. MAIN RESULT 

The results of the previous section already establish that if we have an (perhaps oracular) access to good 
approximations of the targeted stationary distribution, arbitrarily good mixing by unsearching is efficient. 
Here, we will show that, for the case of monotonically decaying distributions we can construct good initial 
states efficiently, in time 0{\og{N)), independently from the shape of the distribution. Our first result is given 
with the following theorem: 
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Theorem 1. Let N = 2"',n G N be an integer, and let C be the convex space of monotonically decaying 
distributions over {1,...,7V}. Then there exists a set of distributions S C D^, [S'! = n, such that for every 
IT G D^, there exists u G S satisfying 


D['k, i^) < 1 


1 

2(n+ !)■ 


( 8 ) 


Proof. We begin by constructing an Wsized set S' = {(T*}^i of’’ladder” distributions which are extreme points 
of the convex space D^. They are defined as: 


r 1/i iij<i 
\ 0, otherwise. 


( 9 ) 


Any distribution tt G can be represented as a convex combination of distributions in S' as follows: 

N 

n = TTia^+'^TTi{ia’‘— {i — (10) 

i=2 


as the parentheses above contain the Kronecker-delta distribution. The expression above can be, by reshuf¬ 
fling, restated as 

C N-l \ N 

- 7rz+i)a' I + Nttnct^ = ^ qi(j\ (11) 

where, for 1 <i < N qi = ii^Ki — TTi+i) and gjv = Since the distribution tt is monotonically decaying, we 

have that qi > 0, and it is also easy to see that = 1. In other words, q = {qi)i is a distribution as well. 

Using the representation above, we can express the distance between tt and G S' as follows: 

N N 

D{7t, a^) = D{Y, q.cjf a’^) < ^ q,D{a\ a% (12) 

i=l i=l 


where the last inequality follows from the strong convexity of the trace distance, and in fact is a strict equality 
in our case. The distance between individual distributions in S' is easy to express explicitly: 


D{a\a^) = 1 - 


min{i, fc} 
max{i, fc} 


(13) 


If we define the matrix V as Vij 


min{z, fc} 
max{z, fc} ’ 


we can express the trace distance between tt and as: 


D{TT,a'^) = l-vlq, (14) 

that is 1 minus the standard inner product between the column of V, denoted Vk, and the probability vector 
q which uniquely specifies tt. Next, we focus on the log-sized subset S Q S' given with S = {cr*|z = 2^',fc = 
0,... ,n — 1}, and establish a lower bound on minq maxfe(u 2 fc )^(7 by coarse-graining. This will then yield an 
upper bound on the distance between an arbitrary decaying distribution tt and the set S. 

From the definition of the vectors V 2 k, k < n it is easy to see that the following holds: 

{V 2 k)j > 1/2 for all 2^= < / < 2^=+^ (15) 

We can see this more generally as, per definition, for vi, I < N/2 = 2”“^ and I < j < 21 we have that 

{vi)j = - > 7 ^ = 1/2. Next, we introduce the coarse-graining operator C, mapping real vectors over 2" 
J 

elements to real vectors over n 1 elements: 


= (%)"=! with 
qi = gi, and for j > 1, qj = ^ qi. 

Z=23-2 + 1 


(16) 


( 17 ) 
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It is clear that L also maps distributions to coarse-grained distributions. By Eq. (15) we have that 


V 2 k > W 2 k , for k < n, with 


r 1/2 if 2 ^= <j< 2 '=+! 
\ 0 , otherwise, 


( 18 ) 

(19) 


where the inequality is taken element-wise. The ancillary vectors W 2 k just capture the positions where the 
vector V 2 k has entries larger or equal to 1/2, setting those to 1/2 and the rest to zero. But then it follows, for 
0 < fc < n — 1 , that 


V2kq>W2kq, ( 20 ) 

where the inequality is taken element-wise. Next, with we denote the Kronecker-delta distribution with unit 
support only over the element, and the inequalities are element-wise, on the coarse-grained n -|- 1 element 
space. It holds that 


(w^o)? > • £(( 7 ), for I < fc < 2, and (21) 

(ui^fc)g > iAfc +2 • £(g),for 0 < k < n. ( 22 ) 

To see the first claim, note that {wJo)q = l/2(gi- 1 - 52 ), whereas Ai-£{q) = qi and A 2 -£( 5 ) = 52 - For the second 

inequality, note that the left hand side of the inequality sums up all elements from q which lie between positions 

2^ and 2^+^ (boarder points included), and multiplies it with 1/2. The right hand side of the inequality picks 
out the (fc -I- 2)”^^ element of the coarse-grained distribution £(g), which, by definition, sums the entries of q 
between the same boundaries, but not including the lower boundary. Then we have that 

max vl)k-iq> max w^k-iq > - max Ak-Ciq). (23) 


Then, the target min-max expression is also lower bounded by 


min max — A^ • £(g), 

q fce{i,...,ri+i} 2 


(24) 


which is easy to evaluate: £(g) is an arbitrary distribution over n -|- 1 elements, and we are free to optimize the 
overlap of this distribution with all Kronecker-delta distributions on the same space. The minimum is attained 
when all the overlaps are the same, so when £(g) is uniform over the space of n -|- 1 elements, and we have 

ming maxfeg{i_ l/ 2 A^-£(g) = — -—. This also lower bounds our target expression miug maxfc(n 2 fc)^.g, 

2 (n ~\~ 1) 

and proves our claim. □ 

In the proof above we have explicitly constructed the log(iV) distributions from the set S. They are the 
n = log 2 (lV) distributions := , for 0 < fc < n — 1 which have uniform support from the first element 

up to element at the (2^)*^ position. For them to be useful for the quantum mixing algorithm the coherent 
encodings of these distributions have to have an efficient construction, which is the case. Start by initializing 
the n—qubit register (sufficient for encoding distributions over the A = 2 " state-space) in the ’’all-zero” state. 
Then, the fc*^ distribution is achieved by applying the Hadamard gate to the last fc qubits. This realizes the 


state = lO)*^ 


-k) 


which encodes the desired distributions, and the reverse of this process, along 


with the reflection over the ’’all zero” state realizes the reflection over |i/^) efficiently as well. 

A few remarks are in order. First, although we have phrased the result for the case when the state space 
is a power of 2, this is without loss of generality - any decaying distribution over N elements is trivially a 
decaying distribution over the larger set of |’log 2 (A)] elements, where we assign zero probability to the tail of 
the distribution. The trace distance result remains the same, once the ceiling function is applied to the log 
term, hence yields the same scaling. 

Next, note that the Theorem 1, along with the given simple method for preparing the log-sized set of initial 
states already yields an efficient algorithm for the preparation of decaying stationary distributions. To see this, 

assume first we know which distribution out of S minimizes the trace distance, bounded by I-—--r. 

2 (log(A) + 1 ) 
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If Itt) is the coherent encoding of the target stationary distribution, by the known inequalities between the trace 
distance and the fidelity^ we have that: 




1 

(2(log(iV) + l))2- 


(25) 


But then, by the results of section II, we can attain the stationary distribution in time 0(1/VS) x 0(2(log(iV)+l)) 
= 0(I/-\/5) X 0(log(A^)), where the soft-0 (O) part suppresses the logarithmically contributing factor stemming 
from the acceptable error term. However, since we do not know which distribution minimizes the distance, the 
trivial solution is to sequentially run the algorithm for each one. This yields an overall log(7V) factor, yielding 
0(1/y/S) X 0(log^{N)) as the total complexity. We can do slightly better by encasing the entire procedure of 
’’searching for the correct initial distribution” in a Grover-like search algorithm, more precisely, an amplitude 
amplification prodecure. 

To see this is possible, note that whether or not a particular distribution was the correct initial choice can be 
heralded - the [tt) —projective measurement, for instance, reveals whether we succeeded or did not. Moreover, 
the ARO{P) operator itself will help realize the Grover oracle, which flips the phase of all states which are not 
the target distribution. The overall procedure can then be given as follows: 

First, we initialize the system in the state b)i where is the coherent encoding of the 

distribution from the set S. This has a complexity of 0(log^(A^)) in the state-space size, but is independent 
from d. Then, in quantum parallel, we run the quantum mixing algorithm on register II (with complexity 
0{l/yf5) X 0(log(A^))), followed by one application of the ARO{P), followed by an un-mixing (the running of 
the mixing algorithm in reverse). This will, approximately (and up to a global phase of —1), introduce a relative 
phase of —1 at those |j) terms, where the searching procedure yielded a success. This constructs the phase-flip 
operator. 

The remainder is the operator which flips over the state |j)i which has a cost of 0{\og^{N)). 

Since at least one distribution, by the correctness of our mixing algorithm, yields the target distribution, this 
extra layer of amplitude amplification needs to be run in a randomized fashion, (since only the lower bound 


is known) [13], on the order of yAog(N) times, until the correct initial distribution is found. The overall 
complexity, is then given with O x log^^^(A^)^ +0 (log^^'^{N)j . The the error factor (multiplying both 


additive terms) which we have for simplicity omitted, and which guarantees that the distance from the target 
distribution is within e in the trace distance, is given with 0(log(l/e) -I- loglog(A^)). The additional loglog(A^) 
term stems from the fact that the ARO{P) operator is applied 0{\og{N)) many times which accumulates errors. 
However, since the effective total error is given by the union bound, it will suffice to rescale the target precision 
by e := e/log(iV), which yields the log log term [7]. In practice, l/yf5 tends to dominate log(A^), thus we have 


the complexity O x log^^^(iV)^, omitting the logarithmically contributing error terms. 


One of the features of our approach is that the actual output of the protocol is a particular coherent encoding 
of the target probability distribution. The classical probability distribution can then be recovered by a mea¬ 
surement of the output state. Having such a quantum output is desirable if our protocol is to be embedded in 
a larger algorithm where the preparation is just an initial step. Examples where this is assumed include hitting 
algorithms [7, 8], and algorithms which require where from a (renormalized) part of the distribution [8, 14]. 
We point out that this property is not a necessary feature of all quantum algorithms for mixing - there are 
promising approaches which utilize decoherence to speed up mixing [3], which may preclude a coherent output. 
The property that the output is a coherent encoding of the target distribution is also maintained in extensions 
of our protocol, which we describe in the next section. 


IV. LOWER BOUNDS AND EXTENSIONS 

The approach we have described in the previous section trivially extends to monotonically increasing distri¬ 
butions as well - since the trace distance is invariant under the simultaneous permutations of the elements of 
the inputs, the same proof holds, where we use ’’ladder distributions” which are reversed in the order of the 
probabilities. However, the approach can be further extended to strongly unimodal distributions, and beyond. 


® Note that the total variation distance on the distributions, corresponds to the trace distance of the incoherent encodings of 
probability distributions, whereas we are interested in the fidelity of the coherent encodings. However, the Uhlmann fidelities of 
coherent and incoherent encodings are equal, so the standard bounds do apply. 





if additional knowledge about the target stationary distribution is assumed. At the end of this section, we will 
explain how such extensions can be obtained. Before this we will address two natural theoretical questions 
which arise from our approach. 

First, in the previous section we have only provided an upper bound of the distance of the set S and an 
unknown monotonically decaying distribution. A-priori, it is not inconceivable that, for the restricted case of 
monotonically decaying distributions, it may be possible that there exists a significantly better choice, with a 
better bound - perhaps achieving a constant distance, instead of a log(7V) dependance. Here we will show that 
a significant improvement of our result is not possible. 

Second, in our setting we have assumed a specific type of prior knowledge of the target stationary distribution. 
It is a fair question whether such knowledge, along with the capacity to prepare particular initial distributions, 
may already offer a significant speed up in the case of classical mixing. If this were the case, our result should 
not be considered as a true speed up of classical mixing. However, we show that the type of assumption we 
impose for the quantum algorithm does not help in the classical approach. 


A. Lower bounds 

The cornerstone of our result relies on the fact that there exists a log(IV)-sized set of distributions in D^, 
which is no more than log^(IV) far (in terms of fidelity) from any distribution tt in D^. It is a fair question 
whether the log{N) dependence can be dropped altogether, and be replaced by a constant, in the complex¬ 
ity of the mixing algorithm. A necessary precondition for this, in the case of our approach, is the following claim: 

Claim 1 There exists a constant 0 < r; < 1, and a family of (arbitrary) distributions one for each 

state space size N, such that for every N G N, and for every tt £ we have that < -q. 

If Claim 1 were to be true, and if the coherent encodings of distributions were efficiently constructable 
(say in time 0(polylog(7V))), then this would constitute a significant improvement over our result. To get 
a bit of intuition, consider a generalization of Claim 1, where tt^ are arbitrary distributions. In this case 
the claim clearly does not hold. Consider any family Then for A £ IN, let fj-min = minj)i 

be the smallest probability occurring in and let jmin = be the position of the smallest 

probability. Then we can choose tt to be the Kronecker delta distribution at position jmin which yields the 
distance 1 — Umin > 1 — ^/N, which converges to 1 with the state space size N. 

Unfortunately, similar simple arguments cannot be straightforwardly utilized in the case when tt is in 
and the proof is a bit more involved. Our theorem (stated later) is the negation of Claim 1, and we prove it by 
contradiction. 

First, we show that Claim 1 implies a seemingly stronger claim denoted Claim 2: 

Claim 2 There exists a constant 0 < 77 < 1, and a family of distributions defined for any 

state space size N, such that for every iV £ DM, and for every tt £ we have that D{q,^^\TT) < 77 . 

The difference between Claim 1 and Claim 2 is that in Claim 2, the family is in the sets of monotonically 
decaying distributions. We prove this generically by sorting. Let Sort be a map from the set of distributions 
to the set of decaying distributions, which, for any distribution tt outputs a distribution tt' where the entries 
of tt' are sorted in a decreasing fashion. While the sorted distributions may not be unique, we may, assume 
Sort picks a unique sorting which preserves the original ordering in the case multiple positions (states) have the 
same probability. Then the following lemma holds: 

Lemma 1. Let 77 he an arbitrary N—state distribution and let tt £ be a monotonically decaying distribution. 
Then D{TT,fi) > Z)( 7 r, Sort(/i)). 


Proof. We will prove that switching any two (adjacent) elements i,j in the distribution 77 , which do not obey 
fJ-i > Ttj can only decrease the distance from tt. This suffices for our lemma, as iterating such switching upon the 
distribution (in an, for us unimportant order) 77 constitutes the well-known Bubble sort algorithm for sorting, 
which converges to a sorted distribution (list) in at most steps. Since each step only decreases the distance, 
by transitivity we have our claim. 
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We prove this by direct verification. Let i, j = i + 1 be such that fii < and let /r' be the distribution 
obtained from /i by switching labels i and j. Then we have: 


D{^,Tt) = ^ l/2|^fc - TTfcl + l/2(|//i - TTil + l/Tj - TTjl) (26) 

k^i and k^j 

D{lJ-',Tr)= l/2\^lk-^Tk\ + ^/‘2{\^lj-^^,\ + \^J,i-^Tj\), (27) 

k^i and k^j 
SO 

2{D{p,tt) - D{p',tt)) = \ni - 7r,| + \nj - tt^] - {\nj - 7r,| + - tt^I) (28) 

= |a;| + |x + (e + i5)| — |x + e| — |a; + (5| (29) 

where x = — iTi, e = fj,j — pLi > 0 and S = ni — iTj >0. We can without the loss of generality assume e < S. 

We have five possibilities: a) x > 0; b) 0 > x > —e, c) —e > x > —S; d) —e — S < x < —S, and e) x < —S — e. 

In the case a) the difference is zero, so the claim holds. In the case b) we have 

— x + x + e + 6 — x — e — x — 6 = —2x > 0. (30) 

In the case c) we have: 

— x + x + e + (5 + x + e — X — (5 = 2e>0. (31) 

In the case d) we get —x + x + e + 5 + x + e + x + 5 = 2(e + d) + 2x > 0. 

Finally, in e) we have —x — x — e — d + x + e + x + (5 = 0. This proves the lemma. □ 

Thus, for our main goal, it will suffice to show that Claim 2 does not hold. 


Next, note that Claim 2 also implies the claim that the distributions v are at most constantly far from all 
’’ladder” distributions defined over the same state space. Moreover, by the extremity of these distributions, 
the inverse holds as well - ly being at most 77 far from all the {cr*}i distributions also implies that /i is at most 
77 far from any monotonically decaying distribution. However, we shall not be using the inverse claim. 

In the remainder we will assume /i is a decaying distribution over N states, and also that {cr*}i are the 
’’ladder” distributions we have described in Eq. (9). 

Recall that /i (since it is in D^) can be represented in the convex-linear basis of (c.f. Eq (11)): 

N 

fi = '^q,a\ (32) 


where for 1 < i < N qi = iiiTi — tt^+i) and q^ = Nirpf. As we have seen, as fi is decaying, we have that q = {qi)i 
is a probability distribution as well, uniquely specified for every 77 . Retracing the steps of Theorem 1, we have 


also dehned vectors with {vk)i = 

and as: 


min{ 7 , k} 
max{ 7 , k} 


using which we can express the trace distance between /i 


= 1 - vlq. 


(33) 


Recall, the matrix V collects the vectors Vk as rows (and columns, since it is symmetric). Then the k*^ row 
of the vector Vq equals 1 — D{ii, The claim we seek to show is that max^ min^ that is the maximal 
overlap of the Vk vectors, optimized over all probability distributions q depends on N and decays to zero in 
the limit of inhnite state space. To establish this, we will use the specific properties of the matrix V and a 
few simple results from the theory of convex spaces. First, to remind the reader, the matrix V is defined as 

miiii z ? r 

Vii =-—r, thus it is a symmetric matrix. Moreover, as we will show later, it is also also invertible. In the 

max{qj}’ 

language of convex spaces this simply means that a point in a convex set is uniquely specified by the distance 
of that point from the extreme points of the convex set. Next, intuitively, the minimum of distances from the 
points Vk (understood as points in an dimensional Euclidian space) is attained by a point in the convex hull 
of those points, which is equally far from all of them. However, the validity of this claim can depend on the 
choice of the distance measure. We shall thus prove this claim for our case. Eor the remainder of the proof, we 
will have to evaluate the distance attained, that is, the value of max^ min^ V.q. Eor this, we will use the explicit 
inverse of the matrix H, which we give as a separate lemma for clarity. 
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Lemma 2. Let N € N, and let V be an N x N matrix defined with Vtj = min{i,j}/max{i, j}. Then V ^ 
exists, and it is a symmetric, diagonally strictly dominant, tri-diagonal matrix specified with: 


[y- 

iov i < N , ■ ■ = 


i{i + 1)^ 


2i + 1 ’ 

Ai^ 


(2i- l)(2i + l)’ 
iV2 


and 




N,N 2N -I' 


(34) 

(35) 

(36) 


Proof. To see that V multiplied with V~^, as defined above, is the identity can be easily checked by examining 
the diagonal and non-diagonal elements of VV~^ separately, so we omit this here. What remains to be seen 
is that V~^ is diagonally dominant. For the column (also row, since it is symmetric) i = 1 it is obvious. For 
1 < i < we have that: 


^ i{i 1) ^ i{i — 1) 

(2i - l)(2i-b 1) ^ 2i-bl 2i- 1 
4i^ ^ i{i A-l)(2i - 1) A-i{i - l)i2i A-1) 

{2i - l)(2i1) ^ (2i-bl)(2t-l) 

4^2 > 2 ( 2^2 - 1 ) 2i^ >2i^ -I 

so the the inequality is strict. Finally, for i = N, we have that 

^ N{N-l) 

2N- 1 ^ 2iV - 1 ’ 

which is also a strict inequality. Thus the lemma holds. 

We now claim that since V~^ is strictly diagonally dominant the optimal value max^minfeU^g is attained 
at a g for which Vq = a(l,...,l),a € R”*" - in this case, the distribution encoded by g is equally far from all 
ladder distributions. 

We prove this claim by contradiction. The negation of this claim implies that there exists a probability 
distribution g' such that Vq' >Vq (that is, the distribution encoded by q' is closer to all ladder distributions), 
where the inequality is taken element - wise. Let V{q' — q) = y with j/ = (j/i,..., yjv)^. Note, y is by assumption 
a non-negative vector, with at least one positive entry. Then (g — g') = V~^y. Note that since g and q' are 

distributions, the sum of all elements of the vector (g — g') is 0. Then, by multiplying both sides of the last 

equality with the row (!,...,!) from the left we get 

G = {l,...,l)V-^y. (41) 

Since V~^ is strictly diagonally dominant (both column and row-wise since it is symmetric), the row 
(1,..., 1)1^“^ is a strictly positive row vector. But then the inner product (1,..., l)V~^y must be strictly 
larger than zero, which is a contraction with Eq. (41). 

Thus g such that Vq = a(l,..., 1)^ maximizes the minimal overlap (inner product between rows of V and 
g). Finally, we evaluate a. By applying V~^ from the left from the last equality, and multiply both sides with 
the row (!,...,!) from the left, since g is a probability vector, we get: 

l = (42) 

In other words, a is the inverse of the sum of all the elements in We then have that: 


(37) 

(38) 

(39) 

(40) 

□ 


1/a 


rN-i 


4P 


(2*-l)(2* + l)^ 


iV2 

2N- 1 


N-l 


i{i -b 1) 


2i + l 


(43) 
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The expression above can be much simplified. We have that: 


N-l 


= E 

2=1 


(2i + 1) 


N-l 


l/a = 2j2 


2=1 


2i^ 


i{i + 1) 


{2i- l)(2i + 1) 2i + l J 2N -I 


iV2 


N-l 


= - 2 E 




= -2i'EEii)-E 


^ \(2i - l){2i + 1) J 2N-1 
i \ 


AN-2 ^ (2i - l)(2i + 1) J 2N-1 

N-l 


= y _; 


2i-l 


N-l 


N 


N-l 


i = E 


^V(2* + l) (2*) 


(2i-l)(2i + l) 2N-1 2N-1 

2=1 
N-l 


1 


1 1 


E ——- -j-1 — H 2 N — — — Hn 

2N 2 


2=1 


2N 




— H 2 N — 2 ^^ ~ '^^N-1/2 + log(2), 


where is the (generalized) Harmonic number function, for a; > 0 defined with 




fc=i 


1 


1 


k k + X J 


Thus we have that 


1/a > -HAr- 1/2 + log(2). 


(44) 

(45) 

(46) 

(47) 

(48) 

(49) 

(50) 

(51) 


Harmonic numbers and the natural logarithm have a well-understood relationship, and for our purposes it 
will suffice that Hj>f_i /2 > \og{N). It follows that 


1/a > - log(lV) 


a < 


(52) 

(53) 


log(iV)' 

The last inequality immediately implies our claim as, putting all the observations together, we then have that: 

2 


min max D{tt^ a) > 1 — 


<y(iDN 


log(lV)’ 


(54) 


where denotes the set of all iV—state distributions. That is, the optimal family of distributions, one for 

2 , . . . . > 

each state space size, will be at least 1-^—r far from the worst case distance from a distribution in Hvr. 

log(iV) ^ 

This shows not only that the Claim 1 does not hold, but that our algorithm is not far from optimal, and that 
if one had access to an oracle (or an efficient construction), of the coherent encodings optimal distributions, 
the overall algorithm would have the efficiency depending on log^^^(iV). In contrast, we constructively achieve 
log^'^^(iV). While the first scaling is clearly better, the difference is moderate, given that both are polylog. For 
completeness, we phrase the results above as a theorem. 

Theorem 2. For any family of distributions each defined over the state space of size 1 < N G IN, 

and for every N, there exists a monotonically decaying distribution tt £ D^-, such that 

2 


D{TT,yL^y > I 


log(iV) ■ 


(55) 
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We end this subsection with a remark on the V matrices, as a curiosity. We have in the proof above, shown 
that the inverse of a V matrix is, technically speaking, an inverse of a strictly diagonally dominant Stieltjes 
matrix, hence also an M-matrix. The problem of characterizing non-negative matrices, whose inverse is an 
M-matrix has attained significant interest in the field of matrix analysis [16]. One result presented in that 
research is a characterization of so-called generalized ultrametric matrices, the subset of which is shown to, by 
inversion, yield diagonally dominant Stieltjes matrix, as is V~^ in our case. It is curious to note that, however, 
the V matrices we use do not fall into the class of the characterized generalized ultrametric matrices, thus may 
have independent interest in the field of matrix analysis. 


B. Classical mixing for decaying distributions 


The results of Section II show that, in the case of quantum mixing through un-searching, the overall mixing 
time strongly depends on the initial state - the mixing time is proportional to the inverse of the square-root 
of the fidelity between the initial state and the targeted stationary distribution state. A similar statement 
holds when we consider the trace distances between the initial distribution (encoded by the quantum state) 
and the stationary distribution. This is clear as the trace distance (of classical distributions), and fidelity (of 
their coherent encodings) are tightly connected. Moreover, this dependence of the mixing time on the distance 
between the initial and target state is robust - regardless of what particular initial state we pick, the mixing 
time just depends on the distance. 

In the classical case, intuitively it is also clear that starting from a distribution, which is close to the target 
distribution, must speed up mixing. As an extreme example, if we wish to achieve mixing within e, and we are 
given an initial state which is already within e from the target, the mixing time (in the sense of the number 
of required applications of the walk operator) is zero. The question is, is the improvement as robust in the 
classical case, as it is in the quantum? Here we show that in the classical case, it is not, and being close to the 
target distribution helps just moderately. To show this we first clarify a fact about classical mixing times, the 
definition of which we repeat for the benefit of the reader. The mixing time ^(e), within error e, for Markov 
chain P, with stationary distribution tt is defined as: 


"(e) = min {t\D{P^a, tt) < e, Vct} 


(56) 

The 


where D{Tr,a) denotes the total variational distance on distributions 7r,CT, so D{'k,(t) = 1/2 Wj ~ ^j\ 
mixing time asks that the state P*tT be e close to tt for all initial states cr, that is, it looks for the worst case 
initial a. By convexity, and the triangle inequality, the worst case initial state cr will be a Kronecker-delta 
distribution with total mass at some state space element. 

We can introduce an analogous mixing time quantity, relative mixing, which extends standard mixing time 
in that the initial state is guaranteed to be within rj from the target state: 


Trj{e) = min {t\D{P*a, tt) < e, Vcr s.t. D{a, tt) < ry} . 


(57) 


Now, suppose we are given a Markov chain P, and we wish to evaluate a bound on T^(e) for this Markov 
chain. In order to capture robust properties, the definition above asks for the worst case as well (as the distance 
requirement it must hold for all a s.t. Z?(ct, tt) < rj), so to bound the relative mixing times, we can construct 
the following distribution p : 


p = (1 - ry)7r -I- rjau 


(58) 


where we choose ahorse to be the worst-case initial state for MC P if we wish to mix it within e/rj. Then we 
have that: 


i||(l - 77)7r -I- rja^orse - ttII = rj^Wa^orse - Trjl < 77, 
so p is within rj distance from tt, as required. Now, we are looking for an integer t > 0, such that: 

\\\P*p — 7r|| < e. 


(59) 


(60) 


Then we have: 
1 


-||PV- ’’'ll = 2IKI “ ■n)P + VP or se “ 7r|| = - 1| (1 - 77)77 -t pP (Jworse “ 77 || = r]D{P* ayjorse, T^) , (61) 
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hence, we require a t such that 


D{P^ ahorse, T^) < “■ (62) 

V 

However, since ahorse was chosen to be the worst case state for mixing within —, we have that 

V 

Tr,{e) > T{e/r]). (63) 

Thus the lower bound of the relative mixing time is just the standard mixing time, where e is replaced with 
e/ 77 . Thus, we get the following lower bounds for relative mixing: 

^ 2 ^ log < T(e/ 77 ) < r^(e). (64) 

It is now clear that the relative mixing has the same dependence on 1/6, hence the improvement is slight. To 
make a fair comparison to the quantum mixing case we have shown, we can set 77 = 1/2 (for our algorithm, the 
trace distance is always larger than this), and see that the lower bound for classical mixing is lower bounded by 
0(l/(51og(l/(4e))), which is essentially the same scaling, as for standard mixing time. 

For completeness, we point out that there have been prior works asking a related question, which establish 
that the mixing time is an essentially robust quantity, independent from the setting (that is, in what context) 
the mixing is applied. We refer the reader to [17] for a collection of such results. 


C. Extensions 

While the main theorem we have used in our approach assumes monotonic (decaying or increasing) distribu¬ 
tions, this can be easily further extended. Assume, for instance, we know that tt, the target distribution over 
iV = 2" elements only decays to some element k, its behavior is unknown from that point on, and the total 
support up to element k is p = Consider for the moment the truncated distribution tt, obtained by 

setting all probabilities after k to zero and re-normalizing (by multiplying with 1/p.) By Theorem 1, we know 
that there exist a (efficiently constructable) log-sized set S of ’’ladder” distributions over N, such that for at 
least one of them, a, it holds that D{a,Tr) < 1 — {n + l)“^/2. But then we have, by the triangle inequality, 
homogeneity of the trace norm, and the fact that the maximal distance is unity, that: 

D{a, tt) < pD{a, n) + {1 - p) = 1 - . (65) 

This implies that the total complexity of the mixing algorithm we have described, applied to this setting will 
be increased, multiplicatively, by p~^. Note that the same reasoning will hold, in the mirrored case, where we 
know that tt is increasing from some element k, with corresponding support of p. 

This simple observation already allows us to efficiently prepare target distributions whose probability mass 
functions are convex (decaying to some element, and increasing from that element). To see this note that either 
the mass of the distribution prior its minimum, or after, must be above or equal to 1/2. Thus, we can simply 
run the algorithm assuming both options which then yields just a constant multiplicative overhead of 4 (two 
runs x(l/ 2 )-i). 

Another extension of this observation is the case where relative to a known order, the distribution is, for 
a known contiguous subset of the state space elements (with total weight p), decaying or increasing. In this 
case as well, the mixing time only suffers a 1/p pre-factor. In particular, this implies that distributions which 
are strictly unimodal (meaning increasing to some element, and decreasing from that element) can also be 
efficiently prepared, provided the mode k is known. To see this holds, note that in the strictly unimodal case 
as described, the total mass of the probability distribution either up to the k*^ element, or after, must be 
above 1/2. Then, the ladder distributions would only be constructed to the element, which again can be 
done with polylog(A^) overhead. Unfortunately, for this case, the knowledge of the position of the mode k is 
neccessary - strictly unimodal distribution also contain the Kronecker-delta distribution. For our approach the 
capacity to efficiently mix to (a distribution arbitrarily close to) an arbitrary Kronecker-delta distribution would 
immediately imply efficient mixing for all distributions. This is beyond what we can claim. 
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V. DISCUSSION 

In this work, we have addressed the problem of attaining stationary distributions of Markov chains using a 
quantum approach. We have built on observations, originally made by Richter and Childs, that quantum hitting 
algorithms run in reverse can serve as mixing algorithms. These observations initially received little attention 
due to their apparent inefficiency - an a-priori square-root scaling with the system size N. We have shown, in 
contrast, that in the cases when it is beforehand known that the target distribution is decaying, relative to a 
known order on the state space, the dependency on the system size is only log^'^^(iV). We have also shown the 
essential optimality of this bound for our approach, in particular, that an explicit dependence on the system 
size is unavoidable and logarithmic. Following this we have shown how our approach easily extends to a much 
wider class of distributions, including concave distributions, but also strictly unimodal distributions, where the 
position of the mode is known. Unfortunately, it is the case that such assumptions are often not satisfied in 
many physics-inspired applications which require mixing of Markov chains. For instance, in statistical physics, 
the mode is often the quantity explicitly sought, when the distribution is known to be unimodal. In other uses, 
e. g. the computation of a permanent of the matrix, the underlying state space is not simply characterized at 
all, and knowing the order would already imply the solution to the problem. 

Nonetheless, other applications involving Markov chain mixing, such as artificial intelligence [14] and applica¬ 
tions relying on bayesian inference (which often rely on MCMC) may have more instances where our approach 
may yield a genuine quantum speed-up. Moreover, the quantum algorithm we have provided realizes a coherent 
encoding of the stationary distribution, which can be used as a fully quantum subroutine, for instance in the 
preparation of initial states in e.g. hitting algorithms [7, 8]. 

Another possibility includes settings where the shape of the target distribution is known (say Gaussian), 
and the mode is also, however, we are interested in learning higher moments of the distribution by mixing 
and sampling. For instance, all correlations of Gaussian states in quantum optics are captured by the second 
moments, whereas the mode and mean coincide, and reveal nothing about correlations. We leave the applications 
of our results for future work. From a more theoretical point of view, the results of this work highlight another 
difference between classical and quantum mixing, in particular, the approaches which rely on reversing hitting 
algorithms. In the classical mixing case, the choice of the initial state does not substantially contribute to 
overall mixing efficiency. In contrast, in the quantum case, improvements in the choice of the initial state can, 
as we have shown, radically alter the overall performance. 

While the conjecture that quantum approaches to mixing can yield a generic quadratic speed-up in all cases 
remains open, our approach extends the class of Markov chains for which such a speed up is possible. Notably, 
unlike in other studied cases where speed up has been shown, our assumptions lay only on the structure of the 
stationary distribution of the Markov chain, rather directly on the structure (underlying digraph) of the Markov 
chain itself. 
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